AITopics | ancient document

Collaborating Authors

ancient document

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Benchmarking Vision-Language Models on Chinese Ancient Documents: From OCR to Knowledge Reasoning

Yu, Haiyang, Wu, Yuchuan, Shi, Fan, Liao, Lei, Lu, Jinghui, Ge, Xiaodong, Wang, Han, Zhuo, Minghan, Wu, Xuecheng, Fei, Xiang, Feng, Hao, Tang, Guozhi, Wang, An-Lan, Zhu, Hanshen, He, Yangfan, Liang, Quanhuan, Meng, Liyuan, Feng, Chao, Huang, Can, Tang, Jingqun, Li, Bin

arXiv.org Artificial IntelligenceSep-15-2025

Chinese ancient documents, invaluable carriers of millennia of Chinese history and culture, hold rich knowledge across diverse fields but face challenges in digitization and understanding--traditional methods only scan images, while current Vision-Language Models (VLMs) struggle with their visual/linguistic complexity. Existing document benchmarks focus on English printed texts or simplified Chinese, leaving a gap for evaluating VLMs on ancient Chinese documents. To address this, we present AncientDoc, the first benchmark for Chinese ancient documents, designed to assess VLMs from OCR to knowledge reasoning. AncientDoc includes five tasks (page-level OCR, vernacular translation, reasoning-based QA, knowledge-based QA, linguistic variant QA) and covers 14 document types, over 100 books, and about 3,000 pages. Based on AncientDoc, we evaluate mainstream VLMs using multiple metrics, supplemented by a human-aligned large language model for scoring. The benchmark are available at https://bytedance.github.io/AncientDoc.

large language model, machine learning, qwen2, (18 more...)

arXiv.org Artificial Intelligence

2509.09731

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.52)

Add feedback

Punctuation restoration Model and Spacing Model for Korean Ancient Document

Jang, Taehong, Ahn, Joonmo, Kim, Sojung Lucia

arXiv.org Artificial IntelligenceDec-19-2023

In Korean ancient documents, there is no spacing or punctuation, and they are written in classical Chinese characters. This makes it challenging for modern individuals and translation models to accurately interpret and translate them. While China has models predicting punctuation and spacing, applying them directly to Korean texts is problematic due to data differences. Therefore, we developed the first models which predict punctuation and spacing for Korean historical texts and evaluated their performance. Our punctuation restoration model achieved an F1 score of 0.84, and Spacing model achieved a score of 0.96. It has the advantage of enabling inference on low-performance GPUs with less VRAM while maintaining quite high accuracy.

chinese character, dataset, punctuation mark, (10 more...)

arXiv.org Artificial Intelligence

2312.11881

Country: Asia > China (0.25)

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.70)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.48)

Add feedback

DeepScribe AI Can Help Translate Ancient Tablets

#artificialintelligenceMar-22-2020, 22:26:26 GMT

Researchers from the University of Chicago's Oriental Institute and the Department of Computer Science have collaborated to design an AI that can help decode tablets from ancient civilizations. According to Phys.org, the AI is called DeepScribe and was trained on over 6,000 annotated images pulled from the Persepolis Fortification Archive, when it is complete the AI model will be able to interpret unanalyzed tablets, making studying ancient documents easier. Experts who study ancient documents, like the researchers who are studying the documents created during the Achaemenid Empire in Persia, need to translate ancient documents by hand, a long process that is prone to errors. Researchers have been using computers to assist in interpreting ancient documents since the 1990s, but the computer programs that were used were of limited help. The complex cuneiform characters, as well as the three-dimensional shape of the tablets, put a cap on how useful the computer programs could be.

ancient document, computer science, help translate ancient tablet, (10 more...)

#artificialintelligence

Country: North America > United States > Illinois > Cook County > Chicago (0.29)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback